Concepedia

Concept

speech processing

Parents

Children

77.9K

Publications

4.2M

Citations

113K

Authors

10.5K

Institutions

Perception-Centered Speech Processing

1948 - 1974

Speech research from 1948 to 1974 framed spoken communication as a perceptually anchored coding problem, guiding analysis, synthesis, and evaluation of intelligibility through perceptual cues. Phonetic and formant analysis became the core toolkit, enabling automatic formant extraction, onset-time discrimination, and phoneme boundary detection within processing pipelines. Neural and brain-level studies linked speech processing to cognitive substrates and hemispheric specialization, while production–perception integration highlighted the interactive nature of articulation and linguistic structure. Early computational work framed automatic recognition and analysis-by-synthesis as foundational approaches for speech technology. Historical Significance: The period produced foundational articulatory–acoustic models connecting vocal tract shaping to acoustic output, underpinning formant transitions and coarticulation; impetus for later production research, speech synthesis, and coding. Foundational perception experiments clarified cue integration and the role of audition in recognition, influencing robust ASR and auditory scene analysis. The Perception of the Speech Code introduced invariant cues enabling phonetic decoding, shaping cognitive models and resilient recognition. Speech Analysis Synthesis and Perception demonstrated the perceptual validity of vocoder-like parameterization, foreshadowing advances in coding and synthesis. The Aeroacoustic general theory on sound generation provided a baseline for articulatory phonetics and early production models.

Perception-centric analyses treated speech as a coded signal whose perceptual cues guide analysis, synthesis, and interpretation, shaping how researchers model speech coding, processing, and intelligibility across perception and linguistics foundations [1], [2], [5], [9], [12], [19].

Phonetic and formant-oriented analysis emerged as the core methodological toolkit for revealing structure in voiced speech, driving automatic formant extraction, onset-time discrimination, and phoneme boundaries within signal processing pipelines [3], [7], [9], [12], [15], [16], [19].

Neural and brain-level investigations linked speech processing to cognitive and neural substrates, emphasizing hemispheric asymmetry, neurolinguistics, and neural processing of noncanonical signals such as backwards speech [6], [8], [10].

Production–perception integration emphasizes context, predictability, pauses, grammar use, and articulatory–phonetic relations, treating speech as an interactive system where production, perception, and linguistic structure shape each other [4], [13], [14], [18], [20].

Early computational work framed automatic recognition and analysis-by-synthesis as the backbone of speech technology, yielding vowel recognition programs, phonetic-pattern detection, and spectral reduction for automated speech analysis [15], [17], [19].

Parametric-Probabilistic Speech Processing (Pre-HMM Era)

1975 - 1981

Parallel Interactive Speech Perception

1982 - 1988

Multiresolution Psychoacoustic Speech Processing

1989 - 1995

Neural-Cognitive Speech Integration

1996 - 2002

Auditory-Motor Speech Dynamics

2003 - 2009

End-to-End Neural Speech

2010 - 2016

End-to-End Speech Synthesis

2017 - 2023